Cryptography and Security 35
☆ Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
We study the ability of Transformer models to learn sequences generated by
Permuted Congruential Generators (PCGs), a widely used family of pseudo-random
number generators (PRNGs). PCGs introduce substantial additional difficulty
over linear congruential generators (LCGs) by applying a series of bit-wise
shifts, XORs, rotations and truncations to the hidden state. We show that
Transformers can nevertheless successfully perform in-context prediction on
unseen sequences from diverse PCG variants, in tasks that are beyond published
classical attacks. In our experiments we scale moduli up to $2^{22}$ using up
to $50$ million model parameters and datasets with up to $5$ billion tokens.
Surprisingly, we find even when the output is truncated to a single bit, it can
be reliably predicted by the model. When multiple distinct PRNGs are presented
together during training, the model can jointly learn them, identifying
structures from different permutations. We demonstrate a scaling law with
modulus $m$: the number of in-context sequence elements required for
near-perfect prediction grows as $\sqrt{m}$. For larger moduli, optimization
enters extended stagnation phases; in our experiments, learning moduli $m \geq
2^{20}$ requires incorporating training data from smaller moduli, demonstrating
a critical necessity for curriculum learning. Finally, we analyze embedding
layers and uncover a novel clustering phenomenon: the model spontaneously
groups the integer inputs into bitwise rotationally-invariant clusters,
revealing how representations can transfer from smaller to larger moduli.
comment: 10+13 pages, 8+19 figures
☆ Toward Automated Security Risk Detection in Large Software Using Call Graph Analysis
Threat modeling plays a critical role in the identification and mitigation of
security risks; however, manual approaches are often labor intensive and prone
to error. This paper investigates the automation of software threat modeling
through the clustering of call graphs using density-based and community
detection algorithms, followed by an analysis of the threats associated with
the identified clusters. The proposed method was evaluated through a case study
of the Splunk Forwarder Operator (SFO), wherein selected clustering metrics
were applied to the software's call graph to assess pertinent code-density
security weaknesses. The results demonstrate the viability of the approach and
underscore its potential to facilitate systematic threat assessment. This work
contributes to the advancement of scalable, semi-automated threat modeling
frameworks tailored for modern cloud-native environments.
☆ A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication
Semantic communication (SemCom) aims to transmit only task-relevant
information, thereby improving communication efficiency but also exposing
semantic information to potential eavesdropping. In this paper, we propose a
deep reinforcement learning (DRL)-empowered multi-level jamming approach to
enhance the security of SemCom systems over MIMO fading wiretap channels. This
approach combines semantic layer jamming, achieved by encoding task-irrelevant
text, and physical layer jamming, achieved by encoding random Gaussian noise.
These two-level jamming signals are superposed with task-relevant semantic
information to protect the transmitted semantics from eavesdropping. A deep
deterministic policy gradient (DDPG) algorithm is further introduced to
dynamically design and optimize the precoding matrices for both taskrelevant
semantic information and multi-level jamming signals, aiming to enhance the
legitimate user's image reconstruction while degrading the eavesdropper's
performance. To jointly train the SemCom model and the DDPG agent, we propose
an alternating optimization strategy where the two modules are updated
iteratively. Experimental results demonstrate that, compared with both the
encryption-based (ESCS) and encoded jammer-based (EJ) benchmarks, our method
achieves comparable security while improving the legitimate user's peak
signalto-noise ratio (PSNR) by up to approximately 0.6 dB.
☆ A Comprehensive Evaluation and Practice of System Penetration Testing
With the rapid advancement of information technology, the complexity of
applications continues to increase, and the cybersecurity challenges we face
are also escalating. This paper aims to investigate the methods and practices
of system security penetration testing, exploring how to enhance system
security through systematic penetration testing processes and technical
approaches. It also examines existing penetration tools, analyzing their
strengths, weaknesses, and applicable domains to guide penetration testers in
tool selection. Furthermore, based on the penetration testing process outlined
in this paper, appropriate tools are selected to replicate attack processes
using target ranges and target machines. Finally, through practical case
analysis, lessons learned from successful attacks are summarized to inform
future research.
☆ Interdependent Privacy in Smart Homes: Hunting for Bystanders in Privacy Policies
Smart home devices such as video doorbells and security cameras are becoming
increasingly common in everyday life. While these devices offer convenience and
safety, they also raise new privacy concerns: how these devices affect others,
like neighbors, visitors, or people passing by. This issue is generally known
as interdependent privacy, where one person's actions (or inaction) may impact
the privacy of others, and, specifically, bystander privacy in the context of
smart homes. Given lax data protection regulations in terms of shared physical
spaces and amateur joint data controllers, we expect that the privacy policies
of smart home products reflect the missing regulatory incentives. This paper
presents a focused privacy policy analysis of 20 video doorbell and smart
camera products, concentrating explicitly on the bystander aspect. We show that
although some of the vendors acknowledge bystanders, they address it only to
the extent of including disclaimers, shifting the ethical responsibility for
collecting the data of non-users to the device owner. In addition, we identify
and examine real-world cases related to bystander privacy, demonstrating how
current deployments can impact non-users. Based on our findings, we analyze
vendor privacy policies in light of existing legal frameworks and technical
capabilities, and we provide practical recommendations for both policy language
and system design to enhance transparency and empower both bystanders and
device owners.
comment: 18 pages, 2 figures
☆ CyberNER: A Harmonized STIX Corpus for Cybersecurity Named Entity Recognition
Extracting structured intelligence via Named Entity Recognition (NER) is
critical for cybersecurity, but the proliferation of datasets with incompatible
annotation schemas hinders the development of comprehensive models. While
combining these resources is desirable, we empirically demonstrate that naively
concatenating them results in a noisy label space that severely degrades model
performance. To overcome this critical limitation, we introduce CyberNER, a
large-scale, unified corpus created by systematically harmonizing four
prominent datasets (CyNER, DNRTI, APTNER, and Attacker) onto the STIX 2.1
standard. Our principled methodology resolves semantic ambiguities and
consolidates over 50 disparate source tags into 21 coherent entity types. Our
experiments show that models trained on CyberNER achieve a substantial
performance gain, with a relative F1-score improvement of approximately 30%
over the naive concatenation baseline. By publicly releasing the CyberNER
corpus, we provide a crucial, standardized benchmark that enables the creation
and rigorous comparison of more robust and generalizable entity extraction
models for the cybersecurity domain.
comment: Accepted for publication at the 24th IEEE International Conference on
Trust, Security and Privacy in Computing and Communications (IEEE TrustCom
2025)
☆ SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification
The rapid advancement of deep neural networks (DNNs) heavily relies on
large-scale, high-quality datasets. However, unauthorized commercial use of
these datasets severely violates the intellectual property rights of dataset
owners. Existing backdoor-based dataset ownership verification methods suffer
from inherent limitations: poison-label watermarks are easily detectable due to
label inconsistencies, while clean-label watermarks face high technical
complexity and failure on high-resolution images. Moreover, both approaches
employ static watermark patterns that are vulnerable to detection and removal.
To address these issues, this paper proposes a sample-specific clean-label
backdoor watermarking (i.e., SSCL-BW). By training a U-Net-based watermarked
sample generator, this method generates unique watermarks for each sample,
fundamentally overcoming the vulnerability of static watermark patterns. The
core innovation lies in designing a composite loss function with three
components: target sample loss ensures watermark effectiveness, non-target
sample loss guarantees trigger reliability, and perceptual similarity loss
maintains visual imperceptibility. During ownership verification, black-box
testing is employed to check whether suspicious models exhibit predefined
backdoor behaviors. Extensive experiments on benchmark datasets demonstrate the
effectiveness of the proposed method and its robustness against potential
watermark removal attacks.
comment: 8 pages,9 figures
☆ A Survey of Heterogeneous Graph Neural Networks for Cybersecurity Anomaly Detection
Anomaly detection is a critical task in cybersecurity, where identifying
insider threats, access violations, and coordinated attacks is essential for
ensuring system resilience. Graph-based approaches have become increasingly
important for modeling entity interactions, yet most rely on homogeneous and
static structures, which limits their ability to capture the heterogeneity and
temporal evolution of real-world environments. Heterogeneous Graph Neural
Networks (HGNNs) have emerged as a promising paradigm for anomaly detection by
incorporating type-aware transformations and relation-sensitive aggregation,
enabling more expressive modeling of complex cyber data. However, current
research on HGNN-based anomaly detection remains fragmented, with diverse
modeling strategies, limited comparative evaluation, and an absence of
standardized benchmarks. To address this gap, we provide a comprehensive survey
of HGNN-based anomaly detection methods in cybersecurity. We introduce a
taxonomy that classifies approaches by anomaly type and graph dynamics, analyze
representative models, and map them to key cybersecurity applications. We also
review commonly used benchmark datasets and evaluation metrics, highlighting
their strengths and limitations. Finally, we identify key open challenges
related to modeling, data, and deployment, and outline promising directions for
future research. This survey aims to establish a structured foundation for
advancing HGNN-based anomaly detection toward scalable, interpretable, and
practically deployable solutions.
comment: 37 pages, 4 figures, 86 references. Submitted to Journal of Computer
Security (under review)
☆ PVMark: Enabling Public Verifiability for LLM Watermarking Schemes
Watermarking schemes for large language models (LLMs) have been proposed to
identify the source of the generated text, mitigating the potential threats
emerged from model theft. However, current watermarking solutions hardly
resolve the trust issue: the non-public watermark detection cannot prove itself
faithfully conducting the detection. We observe that it is attributed to the
secret key mostly used in the watermark detection -- it cannot be public, or
the adversary may launch removal attacks provided the key; nor can it be
private, or the watermarking detection is opaque to the public. To resolve the
dilemma, we propose PVMark, a plugin based on zero-knowledge proof (ZKP),
enabling the watermark detection process to be publicly verifiable by third
parties without disclosing any secret key. PVMark hinges upon the proof of
`correct execution' of watermark detection on which a set of ZKP constraints
are built, including mapping, random number generation, comparison, and
summation. We implement multiple variants of PVMark in Python, Rust and Circom,
covering combinations of three watermarking schemes, three hash functions, and
four ZKP protocols, to show our approach effectively works under a variety of
circumstances. By experimental results, PVMark efficiently enables public
verifiability on the state-of-the-art LLM watermarking schemes yet without
compromising the watermarking performance, promising to be deployed in
practice.
comment: This work has been submitted to the IEEE for possible publication
☆ Who Grants the Agent Power? Defending Against Instruction Injection via Task-Centric Access Control CCS 2025
Yifeng Cai, Ziming Wang, Zhaomeng Deng, Mengyu Yao, Junlin Liu, Yutao Hu, Ziqi Zhang, Yao Guo, Ding Li
AI agents capable of GUI understanding and Model Context Protocol are
increasingly deployed to automate mobile tasks. However, their reliance on
over-privileged, static permissions creates a critical vulnerability:
instruction injection. Malicious instructions, embedded in otherwise benign
content like emails, can hijack the agent to perform unauthorized actions. We
present AgentSentry, a lightweight runtime task-centric access control
framework that enforces dynamic, task-scoped permissions. Instead of granting
broad, persistent permissions, AgentSentry dynamically generates and enforces
minimal, temporary policies aligned with the user's specific task (e.g.,
register for an app), revoking them upon completion. We demonstrate that
AgentSentry successfully prevents an instruction injection attack, where an
agent is tricked into forwarding private emails, while allowing the legitimate
task to complete. Our approach highlights the urgent need for intent-aligned
security models to safely govern the next generation of autonomous agents.
comment: SaTS 2025 (Co-located with ACM CCS 2025)
☆ Who Moved My Transaction? Uncovering Post-Transaction Auditability Vulnerabilities in Modern Super Apps CCS 2025
Junlin Liu, Zhaomeng Deng, Ziming Wang, Mengyu Yao, Yifeng Cai, Yutao Hu, Ziqi Zhang, Yao Guo, Ding Li
Super apps are the cornerstones of modern digital life, embedding financial
transactions into nearly every aspect of daily routine. The prevailing security
paradigm for these platforms is overwhelmingly focused on pre-transaction
authentication, preventing unauthorized payments before they occur. We argue
that a critical vulnerability vector has been largely overlooked: the fragility
of post-transaction audit trails. We investigate the ease with which a user can
permanently erase their transaction history from an app's interface, thereby
concealing unauthorized or sensitive activities from the account owner. To
quantify this threat, we conducted an empirical study with 6 volunteers who
performed a cross-evaluation on six super apps. Our findings are alarming: all
six applications studied allow users to delete transaction records, yet a
staggering five out of six (83+\%) fail to protect these records with strong
authentication. Only one app in our study required biometric verification for
deletion. This study provides the first concrete evidence of this
near-ubiquitous vulnerability, demonstrating a critical gap in the current
mobile security landscape and underscoring the urgent need for a paradigm shift
towards ensuring post-transaction audit integrity.
comment: SaTS 2025 (Co-Located with ACM CCS 2025)
☆ Confidential FRIT via Homomorphic Encryption
Edge computing alleviates the computation burden of data-driven control in
cyber-physical systems (CPSs) by offloading complex processing to edge servers.
However, the increasing sophistication of cyberattacks underscores the need for
security measures that go beyond conventional IT protections and address the
unique vulnerabilities of CPSs. This study proposes a confidential data-driven
gain-tuning framework using homomorphic encryption, such as ElGamal and CKKS
encryption schemes, to enhance cybersecurity in gain-tuning processes
outsourced to external servers. The idea for realizing confidential FRIT is to
replace the matrix inversion operation with a vector summation form, allowing
homomorphic operations to be applied. Numerical examples under 128-bit security
confirm performance comparable to conventional methods while providing
guidelines for selecting suitable encryption schemes for secure CPS.
☆ Security Risk of Misalignment between Text and Image in Multi-modal Model
Despite the notable advancements and versatility of multi-modal diffusion
models, such as text-to-image models, their susceptibility to adversarial
inputs remains underexplored. Contrary to expectations, our investigations
reveal that the alignment between textual and Image modalities in existing
diffusion models is inadequate. This misalignment presents significant risks,
especially in the generation of inappropriate or Not-Safe-For-Work (NSFW)
content. To this end, we propose a novel attack called Prompt-Restricted
Multi-modal Attack (PReMA) to manipulate the generated content by modifying the
input image in conjunction with any specified prompt, without altering the
prompt itself. PReMA is the first attack that manipulates model outputs by
solely creating adversarial images, distinguishing itself from prior methods
that primarily generate adversarial prompts to produce NSFW content.
Consequently, PReMA poses a novel threat to the integrity of multi-modal
diffusion models, particularly in image-editing applications that operate with
fixed prompts. Comprehensive evaluations conducted on image inpainting and
style transfer tasks across various models confirm the potent efficacy of
PReMA.
☆ Security Vulnerabilities in AI-Generated Code: A Large-Scale Analysis of Public GitHub Repositories
This paper presents a comprehensive empirical analysis of security
vulnerabilities in AI-generated code across public GitHub repositories. We
collected and analyzed 7,703 files explicitly attributed to four major AI
tools: ChatGPT (91.52\%), GitHub Copilot (7.50\%), Amazon CodeWhisperer
(0.52\%), and Tabnine (0.46\%). Using CodeQL static analysis, we identified
4,241 Common Weakness Enumeration (CWE) instances across 77 distinct
vulnerability types. Our findings reveal that while 87.9\% of AI-generated code
does not contain identifiable CWE-mapped vulnerabilities, significant patterns
emerge regarding language-specific vulnerabilities and tool performance. Python
consistently exhibited higher vulnerability rates (16.18\%-18.50\%) compared to
JavaScript (8.66\%-8.99\%) and TypeScript (2.50\%-7.14\%) across all tools. We
observed notable differences in security performance, with GitHub Copilot
achieving better security density for Python (1,739 LOC per CWE) and
TypeScript, while ChatGPT performed better for JavaScript. Additionally, we
discovered widespread use of AI tools for documentation generation (39\% of
collected files), an understudied application with implications for software
maintainability. These findings extend previous work with a significantly
larger dataset and provide valuable insights for developing language-specific
and context-aware security practices for the responsible integration of
AI-generated code into software development workflows.
comment: This preprint has not undergone peer review or any post-submission
improvements or corrections. The Version of Record of this contribution is
published in Volume 16219 of the Lecture Notes in Computer Science series,
and is available online at https://doi.org/10.1007/978-981-95-3537-8_9
☆ PEEL: A Poisoning-Exposing Encoding Theoretical Framework for Local Differential Privacy
Lisha Shuai, Jiuling Dong, Nan Zhang, Shaofeng Tan, Haokun Zhang, Zilong Song, Gaoya Dong, Xiaolong Yang
Local Differential Privacy (LDP) is a widely adopted privacy-protection model
in the Internet of Things (IoT) due to its lightweight, decentralized, and
scalable nature. However, it is vulnerable to poisoning attacks, and existing
defenses either incur prohibitive resource overheads or rely on domain-specific
prior knowledge, limiting their practical deployment. To address these
limitations, we propose PEEL, a Poisoning-Exposing Encoding theoretical
framework for LDP, which departs from resource- or prior-dependent
countermeasures and instead leverages the inherent structural consistency of
LDP-perturbed data. As a non-intrusive post-processing module, PEEL amplifies
stealthy poisoning effects by re-encoding LDP-perturbed data via
sparsification, normalization, and low-rank projection, thereby revealing both
output and rule poisoning attacks through structural inconsistencies in the
reconstructed space. Theoretical analysis proves that PEEL, integrated with
LDP, retains unbiasedness and statistical accuracy, while being robust to
expose both output and rule poisoning attacks. Moreover, evaluation results
show that LDP-integrated PEEL not only outperforms four state-of-the-art
defenses in terms of poisoning exposure accuracy but also significantly reduces
client-side computational costs, making it highly suitable for large-scale IoT
deployments.
comment: 14 pages, 1 figures
☆ ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models NeurIPS 2025
Recent advances in Audio-Language Models (ALMs) have significantly improved
multimodal understanding capabilities. However, the introduction of the audio
modality also brings new and unique vulnerability vectors. Previous studies
have proposed jailbreak attacks that specifically target ALMs, revealing that
defenses directly transferred from traditional audio adversarial attacks or
text-based Large Language Model (LLM) jailbreaks are largely ineffective
against these ALM-specific threats. To address this issue, we propose ALMGuard,
the first defense framework tailored to ALMs. Based on the assumption that
safety-aligned shortcuts naturally exist in ALMs, we design a method to
identify universal Shortcut Activation Perturbations (SAPs) that serve as
triggers that activate the safety shortcuts to safeguard ALMs at inference
time. To better sift out effective triggers while preserving the model's
utility on benign tasks, we further propose Mel-Gradient Sparse Mask (M-GSM),
which restricts perturbations to Mel-frequency bins that are sensitive to
jailbreaks but insensitive to speech understanding. Both theoretical analyses
and empirical results demonstrate the robustness of our method against both
seen and unseen attacks. Overall, \MethodName reduces the average success rate
of advanced ALM-specific jailbreak attacks to 4.6% across four models, while
maintaining comparable utility on benign benchmarks, establishing it as the new
state of the art. Our code and data are available at
https://github.com/WeifeiJin/ALMGuard.
comment: Accepted to NeurIPS 2025
☆ SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
The ability of LLM agents to plan and invoke tools exposes them to new safety
risks, making a comprehensive red-teaming system crucial for discovering
vulnerabilities and ensuring their safe deployment. We present SIRAJ: a generic
red-teaming framework for arbitrary black-box LLM agents. We employ a dynamic
two-step process that starts with an agent definition and generates diverse
seed test cases that cover various risk outcomes, tool-use trajectories, and
risk sources. Then, it iteratively constructs and refines model-based
adversarial attacks based on the execution trajectories of former attempts. To
optimize the red-teaming cost, we present a model distillation approach that
leverages structured forms of a teacher model's reasoning to train smaller
models that are equally effective. Across diverse evaluation agent settings,
our seed test case generation approach yields 2 -- 2.5x boost to the coverage
of risk outcomes and tool-calling trajectories. Our distilled 8B red-teamer
model improves attack success rate by 100%, surpassing the 671B Deepseek-R1
model. Our ablations and analyses validate the effectiveness of the iterative
framework, structured reasoning, and the generalization of our red-teamer
models.
♻ ☆ TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
Text embedding inversion attacks reconstruct original sentences from latent
representations, posing severe privacy threats in collaborative inference and
edge computing. We propose TextCrafter, an optimization-based adversarial
perturbation mechanism that combines RL learned, geometry aware noise injection
orthogonal to user embeddings with cluster priors and PII signal guidance to
suppress inversion while preserving task utility. Unlike prior defenses either
non learnable or agnostic to perturbation direction, TextCrafter provides a
directional protective policy that balances privacy and utility. Under strong
privacy setting, TextCrafter maintains 70 percentage classification accuracy on
four datasets and consistently outperforms Gaussian/LDP baselines across lower
privacy budgets, demonstrating a superior privacy utility trade off.
comment: More sufficient and convincing experiments are needed
♻ ☆ GSE: Group-wise Sparse and Explainable Adversarial Attacks
Sparse adversarial attacks fool deep neural networks (DNNs) through minimal
pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts
have replaced this norm with a structural sparsity regularizer, such as the
nuclear group norm, to craft group-wise sparse adversarial attacks. The
resulting perturbations are thus explainable and hold significant practical
relevance, shedding light on an even greater vulnerability of DNNs. However,
crafting such attacks poses an optimization challenge, as it involves computing
norms for groups of pixels within a non-convex objective. We address this by
presenting a two-phase algorithm that generates group-wise sparse attacks
within semantically meaningful areas of an image. Initially, we optimize a
quasinorm adversarial loss using the $1/2-$quasinorm proximal operator tailored
for non-convex programming. Subsequently, the algorithm transitions to a
projected Nesterov's accelerated gradient descent with $2-$norm regularization
applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and
ImageNet datasets demonstrate a remarkable increase in group-wise sparsity,
e.g., $50.9\%$ on CIFAR-10 and $38.4\%$ on ImageNet (average case, targeted
attack). This performance improvement is accompanied by significantly faster
computation times, improved explainability, and a $100\%$ attack success rate.
♻ ☆ Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
When using a public communication channel -- whether formal or informal, such
as commenting or posting on social media -- end users have no expectation of
privacy: they compose a message and broadcast it for the world to see. Even if
an end user takes utmost precautions to anonymize their online presence --
using an alias or pseudonym; masking their IP address; spoofing their
geolocation; concealing their operating system and user agent; deploying
encryption; registering with a disposable phone number or email; disabling
non-essential settings; revoking permissions; and blocking cookies and
fingerprinting -- one obvious element still lingers: the message itself.
Assuming they avoid lapses in judgment or accidental self-exposure, there
should be little evidence to validate their actual identity, right? Wrong. The
content of their message -- necessarily open for public consumption -- exposes
an attack vector: stylometric analysis, or author profiling. In this paper, we
dissect the technique of stylometry, discuss an antithetical counter-strategy
in adversarial stylometry, and devise enhancements through Unicode
steganography.
comment: 33 pages, 7 figures, 3 tables
♻ ☆ Measuring the Availability and Response Times of Public Encrypted DNS Resolvers
Unencrypted DNS traffic between users and DNS resolvers can lead to privacy
and security concerns. In response to these privacy risks, many browser vendors
have deployed DNS-over-HTTPS (DoH) to encrypt queries between users and DNS
resolvers. Today, many client-side deployments of DoH, particularly in
browsers, select between only a few resolvers, despite the fact that many more
encrypted DNS resolvers are deployed in practice. Unfortunately, if users only
have a few choices of encrypted resolver, and only a few perform well from any
particular vantage point, then the privacy problems that DoH was deployed to
help address merely shift to a different set of third parties. It is thus
important to assess the performance characteristics of more encrypted DNS
resolvers, to determine how many options for encrypted DNS resolvers users tend
to have in practice. In this paper, we explore the performance of a large group
of encrypted DNS resolvers supporting DoH by measuring DNS query response times
from global vantage points in North America, Europe, and Asia. Our results show
that many non-mainstream resolvers have higher response times than mainstream
resolvers, particularly for non-mainstream resolvers that are queried from more
distant vantage points -- suggesting that most encrypted DNS resolvers are not
replicated or anycast. In some cases, however, certain non-mainstream resolvers
perform at least as well as mainstream resolvers, suggesting that users may be
able to use a broader set of encrypted DNS resolvers than those that are
available in current browser configurations.
♻ ☆ A Survey of Internet Censorship and its Measurement: Methodology, Trends, and Challenges
Internet censorship limits the access of nodes residing within a specific
network environment to the public Internet, and vice versa. During the last
decade, techniques for conducting Internet censorship have been developed
further. Consequently, methodology for measuring Internet censorship had been
improved as well.
In this paper, we firstly provide a survey of network-level Internet
censorship techniques. Secondly, we survey censorship measurement methodology.
We further cover the censorship of circumvention tools and its measurement, as
well as available datasets. In cases where it is beneficial, we bridge the
terminology and taxonomy of Internet censorship with related domains, namely
traffic obfuscation and information hiding. We further extend the technical
perspective with recent trends and challenges, including human aspects of
Internet censorship.
comment: Appeared in Computers & Security (Elsevier, 2025)
♻ ☆ Cybersecurity threat detection based on a UEBA framework using Deep Autoencoders
User and Entity Behaviour Analytics (UEBA) is a broad branch of data
analytics that attempts to build a normal behavioural profile in order to
detect anomalous events. Among the techniques used to detect anomalies, Deep
Autoencoders constitute one of the most promising deep learning models on UEBA
tasks, allowing explainable detection of security incidents that could lead to
the leak of personal data, hijacking of systems, or access to sensitive
business information. In this study, we introduce the first implementation of
an explainable UEBA-based anomaly detection framework that leverages Deep
Autoencoders in combination with Doc2Vec to process both numerical and textual
features. Additionally, based on the theoretical foundations of neural
networks, we offer a novel proof demonstrating the equivalence of two widely
used definitions for fully-connected neural networks. The experimental results
demonstrate the proposed framework capability to detect real and synthetic
anomalies effectively generated from real attack data, showing that the models
provide not only correct identification of anomalies but also explainable
results that enable the reconstruction of the possible origin of the anomaly.
Our findings suggest that the proposed UEBA framework can be seamlessly
integrated into enterprise environments, complementing existing security
systems for explainable threat detection.
comment: Published in AIMS Mathematics (2025), 10(10): 23496-23517. DOI:
10.3934/math.20251043
♻ ☆ Multiple Proposer Transaction Fee Mechanism Design: Robust Incentives Against Censorship and Bribery
Censorship resistance is one of the core value proposition of blockchains. A
recurring design pattern aimed at providing censorship resistance is enabling
multiple proposers to contribute inputs into block construction. Notably,
Fork-Choice Enforced Inclusion Lists (FOCIL) is proposed to be included in
Ethereum. However, the current proposal relies on altruistic behavior, without
a Transaction Fee Mechanism (TFM). This study aims to address this gap by
exploring how multiple proposers should be rewarded to incentivize censorship
resistance. The main contribution of this work is the identification of TFMs
that ensure censorship resistance under bribery attacks, while also satisfying
the incentive compatibility properties of EIP-1559. We provide a concrete
payment mechanism for FOCIL, along with generalizable contributions to the
literature by analyzing 1) incentive compatibility of TFMs in the presence of a
bribing adversary, 2) TFMs in protocols with multiple phases of transaction
inclusion, and 3) TFMs of protocols in which parties are uncertain about the
behavior and the possible bribe of others.
comment: This work has been submitted to the IEEE for possible publication
♻ ☆ Network Oblivious Transfer via Noisy Channels: Limits and Capacities
In this paper, we aim to study the information-theoretical limits of
oblivious transfer. This work also investigates the problem of oblivious
transfer over a noisy multiple access channel involving two non-colluding
senders and a single receiver. The channel model is characterized by
correlations among the parties, with the parties assumed to be either
honest-but-curious or, in the receiver's case, potentially malicious. At first,
we study the information-theoretical limits of oblivious transfer between two
parties and extend it to the multiple access channel model. We propose a
multiparty protocol for honest-but-curious parties where the general multiple
access channel is reduced to a certain correlation. In scenarios where the
receiver is malicious, the protocol achieves an achievable rate region.
comment: 33 pages, 3 Figures. A short version of this paper is published at
the ISIT2025
♻ ☆ VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference
Decentralized inference provides a scalable and resilient paradigm for
serving large language models (LLMs), enabling distributed resource utilization
and reducing reliance on centralized providers. However, in a permissionless
environment without trusted nodes, ensuring the correctness of model outputs
remains a core challenge. We introduce VeriLLM, a publicly verifiable protocol
for decentralized LLM inference that achieves security under a
one-honest-verifier assumption while maintaining practical efficiency. VeriLLM
combines lightweight empirical rerunning with cryptographic commitments,
allowing verifiers to validate results at approximately 1% of the underlying
inference cost. To prevent verification bottlenecks, we design an isomorphic
inference-verification architecture that multiplexes both inference and
verification roles across the same GPU workers. This design (i) improves GPU
utilization and overall throughput, (ii) enlarges the effective validator set,
enhancing robustness and liveness, and (iii) enforces task indistinguishability
to prevent node-specific optimizations or selective behavior. Through
theoretical analysis and system-level evaluation, we show that VeriLLM achieves
reliable public verifiability with minimal overhead, offering a practical
foundation for trustworthy and scalable decentralized LLM inference.
comment: 20 pages, 4 figures, 6 tables
♻ ☆ The RAG Paradox: A Black-Box Attack Exploiting Unintentional Vulnerabilities in Retrieval-Augmented Generation Systems
With the growing adoption of retrieval-augmented generation (RAG) systems,
various attack methods have been proposed to degrade their performance.
However, most existing approaches rely on unrealistic assumptions in which
external attackers have access to internal components such as the retriever. To
address this issue, we introduce a realistic black-box attack based on the RAG
paradox, a structural vulnerability arising from the system's effort to enhance
trust by revealing both the retrieved documents and their sources to users.
This transparency enables attackers to observe which sources are used and how
information is phrased, allowing them to craft poisoned documents that are more
likely to be retrieved and upload them to the identified sources. Moreover, as
RAG systems directly provide retrieved content to users, these documents must
not only be retrievable but also appear natural and credible to maintain user
confidence in the search results. Unlike prior work that focuses solely on
improving document retrievability, our attack method explicitly considers both
retrievability and user trust in the retrieved content. Both offline and online
experiments demonstrate that our method significantly degrades system
performance without internal access, while generating natural-looking poisoned
documents.
♻ ☆ The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs
Songyang Liu, Chaozhuo Li, Jiameng Qiu, Xi Zhang, Feiran Huang, Litian Zhang, Yiming Hei, Philip S. Yu
With the rapid advancement of artificial intelligence, Large Language Models
(LLMs) have shown remarkable capabilities in Natural Language Processing (NLP),
including content generation, human-computer interaction, machine translation,
and code generation. However, their widespread deployment has also raised
significant safety concerns. In particular, LLM-generated content can exhibit
unsafe behaviors such as toxicity, bias, or misinformation, especially in
adversarial contexts, which has attracted increasing attention from both
academia and industry. Although numerous studies have attempted to evaluate
these risks, a comprehensive and systematic survey on safety evaluation of LLMs
is still lacking. This work aims to fill this gap by presenting a structured
overview of recent advances in safety evaluation of LLMs. Specifically, we
propose a four-dimensional taxonomy: (i) Why to evaluate, which explores the
background of safety evaluation of LLMs, how they differ from general LLMs
evaluation, and the significance of such evaluation; (ii) What to evaluate,
which examines and categorizes existing safety evaluation tasks based on key
capabilities, including dimensions such as toxicity, robustness, ethics, bias
and fairness, truthfulness, and related aspects; (iii) Where to evaluate, which
summarizes the evaluation metrics, datasets and benchmarks currently used in
safety evaluations; (iv) How to evaluate, which reviews existing mainstream
evaluation methods based on the roles of the evaluators and some evaluation
frameworks that integrate the entire evaluation pipeline. Finally, we identify
the challenges in safety evaluation of LLMs and propose promising research
directions to promote further advancement in this field. We emphasize the
necessity of prioritizing safety evaluation to ensure the reliable and
responsible deployment of LLMs in real-world applications.
comment: 20 pages, preprint
♻ ☆ IRCopilot: Automated Incident Response with Large Language Models
Incident response plays a pivotal role in mitigating the impact of cyber
attacks. In recent years, the intensity and complexity of global cyber threats
have grown significantly, making it increasingly challenging for traditional
threat detection and incident response methods to operate effectively in
complex network environments. While Large Language Models (LLMs) have shown
great potential in early threat detection, their capabilities remain limited
when it comes to automated incident response after an intrusion. To address
this gap, we construct an incremental benchmark based on real-world incident
response tasks to thoroughly evaluate the performance of LLMs in this domain.
Our analysis reveals several key challenges that hinder the practical
application of contemporary LLMs, including context loss, hallucinations,
privacy protection concerns, and their limited ability to provide accurate,
context-specific recommendations. In response to these challenges, we propose
IRCopilot, a novel framework for automated incident response powered by LLMs.
IRCopilot mimics the three dynamic phases of a real-world incident response
team using four collaborative LLM-based session components. These components
are designed with clear divisions of responsibility, reducing issues such as
hallucinations and context loss. Our method leverages diverse prompt designs
and strategic responsibility segmentation, significantly improving the system's
practicality and efficiency. Experimental results demonstrate that IRCopilot
outperforms baseline LLMs across key benchmarks, achieving sub-task completion
rates of 150%, 138%, 136%, 119%, and 114% for various response tasks. Moreover,
IRCopilot exhibits robust performance on public incident response platforms and
in real-world attack scenarios, showcasing its strong applicability.
♻ ☆ Security Modelling for Cyber-Physical Systems: A Systematic Literature Review
Cyber-physical systems are at the intersection of digital technology and
engineering domains, rendering them high-value targets of sophisticated and
well-funded cybersecurity threat actors. Prominent cybersecurity attacks on CPS
have brought attention to the vulnerability of these systems and the inherent
weaknesses of critical infrastructure reliant on them. Security modelling for
CPS is an important mechanism to systematically identify and assess
vulnerabilities, threats, and risks throughout system life cycles, and to
ultimately ensure system resilience, safety, and reliability. This survey
delves into state-of-the-art research on CPS security modelling, encompassing
both threat and attack modelling. While these terms are sometimes used
interchangeably, they are different concepts. This paper elaborates on the
differences between threat and attack modelling, examining their implications
for CPS security. We conducted a systematic search that yielded 449 papers,
from which 32 were selected and categorised into three clusters: those focused
on threat modelling methods, attack modelling methods, and literature reviews.
Specifically, we sought to examine what security modelling methods exist today,
and how they address real-world cybersecurity threats and CPS-specific attacker
capabilities throughout the life cycle of CPS, which typically span longer
durations compared to traditional IT systems. This paper also highlights
several limitations in existing research, wherein security models adopt
simplistic approaches that do not adequately consider the dynamic, multi-layer,
multi-path, and multi-agent characteristics of real-world cyber-physical
attacks.
comment: Accepted by ACM Transactions on Cyber-Physical Systems (TCPS)
♻ ☆ Evaluating Argon2 Adoption and Effectiveness in Real-World Software
Modern password hashing remains a critical defense against credential
cracking, yet the transition from theoretically secure algorithms to robust
real-world implementations remains fraught with challenges. This paper presents
a dual analysis of Argon2, the Password Hashing Competition winner, combining
attack simulations quantifying how parameter configurations impact guessing
costs under realistic budgets, with the first large-scale empirical study of
Argon2 adoption across public GitHub software repositories. Our economic model,
validated against cryptocurrency mining benchmarks, demonstrates that OWASP's
recommended 46 MiB configuration reduces compromise rates by 42.5% compared to
SHA-256 at \$1/account attack budgets for strong user passwords. However,
memory-hardness exhibits diminishing returns as increasing allocations to RFC
9106's 2048 MiB provides just 23.3% (\$1) and 17.7% (\$20) additional
protection despite 44.5 times greater memory demands. Crucially, both
configurations fail to mitigate risks from weak passwords, with 96.9-99.8%
compromise rates for RockYou-like credentials regardless of algorithm choice.
Our repository analysis shows accelerating Argon2 adoption, yet weak
configuration practices: 46.6% of deployments use weaker-than-OWASP parameters.
Surprisingly, sensitive applications (password managers, encryption tools) show
no stronger configurations than general software. Our findings highlight that a
secure algorithm alone cannot ensure security, effective parameter guidance and
developer education remain essential for realizing Argon2's theoretical
advantages.
comment: This preprint has not undergone peer review or any post-submission
improvements or corrections. The Version of Record of this contribution is
published in Volume 15993 of the Lecture Notes in Computer Science series,
and is available online at https://doi.org/10.1007/978-3-032-00627-1_2
♻ ☆ Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models ACL
Large Language Models (LLMs), especially those accessed via APIs, have
demonstrated impressive capabilities across various domains. However, users
without technical expertise often turn to (untrustworthy) third-party services,
such as prompt engineering, to enhance their LLM experience, creating
vulnerabilities to adversarial threats like backdoor attacks.
Backdoor-compromised LLMs generate malicious outputs to users when inputs
contain specific "triggers" set by attackers. Traditional defense strategies,
originally designed for small-scale models, are impractical for API-accessible
LLMs due to limited model access, high computational costs, and data
requirements. To address these limitations, we propose Chain-of-Scrutiny (CoS)
which leverages LLMs' unique reasoning abilities to mitigate backdoor attacks.
It guides the LLM to generate reasoning steps for a given input and scrutinizes
for consistency with the final output -- any inconsistencies indicating a
potential attack. It is well-suited for the popular API-only LLM deployments,
enabling detection at minimal cost and with little data. User-friendly and
driven by natural language, it allows non-experts to perform the defense
independently while maintaining transparency. We validate the effectiveness of
CoS through extensive experiments on various tasks and LLMs, with results
showing greater benefits for more powerful LLMs.
comment: This paper has been accepted to ACL Findings 2025
♻ ☆ Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features ICCV 2025
The ability of deep neural networks (DNNs) come from extracting and
interpreting features from the data provided. By exploiting intermediate
features in DNNs instead of relying on hard labels, we craft adversarial
perturbation that generalize more effectively, boosting black-box
transferability. These features ubiquitously come from supervised learning in
previous work. Inspired by the exceptional synergy between self-supervised
learning and the Transformer architecture, this paper explores whether
exploiting self-supervised Vision Transformer (ViT) representations can improve
adversarial transferability. We present dSVA -- a generative dual
self-supervised ViT features attack, that exploits both global structural
features from contrastive learning (CL) and local textural features from masked
image modeling (MIM), the self-supervised learning paradigm duo for ViTs. We
design a novel generative training framework that incorporates a generator to
create black-box adversarial examples, and strategies to train the generator by
exploiting joint features and the attention mechanism of self-supervised ViTs.
Our findings show that CL and MIM enable ViTs to attend to distinct feature
tendencies, which, when exploited in tandem, boast great adversarial
generalizability. By disrupting dual deep features distilled by self-supervised
ViTs, we are rewarded with remarkable black-box transferability to models of
various architectures that outperform state-of-the-arts. Code available at
https://github.com/spencerwooo/dSVA.
comment: 14 pages, 9 figures, accepted at ICCV 2025
♻ ☆ Model Provenance Testing for Large Language Models
Large language models are increasingly customized through fine-tuning and
other adaptations, creating challenges in enforcing licensing terms and
managing downstream impacts. Tracking model origins is crucial both for
protecting intellectual property and for identifying derived models when biases
or vulnerabilities are discovered in foundation models. We address this
challenge by developing a framework for testing model provenance: Whether one
model is derived from another. Our approach is based on the key observation
that real-world model derivations preserve significant similarities in model
outputs that can be detected through statistical analysis. Using only black-box
access to models, we employ multiple hypothesis testing to compare model
similarities against a baseline established by unrelated models. On two
comprehensive real-world benchmarks spanning models from 30M to 4B parameters
and comprising over 600 models, our tester achieves 90-95% precision and 80-90%
recall in identifying derived models. These results demonstrate the viability
of systematic provenance verification in production environments even when only
API access is available.
♻ ☆ Improving LLM Safety Alignment with Dual-Objective Optimization ICML 2025
Existing training-time safety alignment techniques for large language models
(LLMs) remain vulnerable to jailbreak attacks. Direct preference optimization
(DPO), a widely deployed alignment method, exhibits limitations in both
experimental and theoretical contexts as its loss function proves suboptimal
for refusal learning. Through gradient-based analysis, we identify these
shortcomings and propose an improved safety alignment that disentangles DPO
objectives into two components: (1) robust refusal training, which encourages
refusal even when partial unsafe generations are produced, and (2) targeted
unlearning of harmful knowledge. This approach significantly increases LLM
robustness against a wide range of jailbreak attacks, including prefilling,
suffix, and multi-turn attacks across both in-distribution and
out-of-distribution scenarios. Furthermore, we introduce a method to emphasize
critical refusal tokens by incorporating a reward-based token-level weighting
mechanism for refusal learning, which further improves the robustness against
adversarial exploits. Our research also suggests that robustness to jailbreak
attacks is correlated with token distribution shifts in the training process
and internal representations of refusal and harmful tokens, offering valuable
directions for future research in LLM safety alignment. The code is available
at https://github.com/wicai24/DOOR-Alignment
comment: ICML 2025